{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hackathon #1\n",
    "\n",
    "Written by Eleanor Quint\n",
    "\n",
    "Topics:\n",
    "- The basic unit of computation, the Tensor and operations\n",
    "- How to create and optimize trainable Variables\n",
    "- Gradient descent optimization\n",
    "\n",
    "This is all setup in a IPython notebook so you can run any code you want to experiment with. Feel free to edit any cell, or add some to run your own code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We'll start with our library imports...\n",
    "from __future__ import print_function\n",
    "\n",
    "import numpy as np       # to use numpy arrays\n",
    "import tensorflow as tf  # to specify and run computation graphs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tensor\n",
    "\n",
    "The basic unit of data in TensorFlow is the [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor). A tensor is a multi-dimensional array of numerical variable specialized for numerical computation with an underlying data type (think `float` or `int`). Here are some examples of tensors we'll create with [`tf.convert_to_tensor`](https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor), or [`tf.ones`](https://www.tensorflow.org/api_docs/python/tf/ones) or `tf.zeros` (which have the same interface)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# a 0-dim tensor; a scalar with shape ()\n",
    "its_complicated = tf.convert_to_tensor(12.3 - 4.85j, tf.complex64)\n",
    "print(its_complicated)\n",
    "print()\n",
    "\n",
    "# a 1-dim tensor; a vector with shape (5,), meaning it's just a plain 'ol array\n",
    "# notice that we've given a name to this variable\n",
    "first_primes = tf.convert_to_tensor(np.array([2, 3, 5, 7, 11], np.int32), name='primes')\n",
    "print(first_primes)\n",
    "print()\n",
    "\n",
    "# a 2-dim tensor; a matrix with shape [2, 3]\n",
    "# notice that the dtype is inferred when we don't specify it\n",
    "my_identity = tf.ones([2,3])\n",
    "print(my_identity)\n",
    "print(\"We can retrieve a numpy array from TensorFlow:\", my_identity.numpy(), \"is a\", type(my_identity.numpy()))\n",
    "print()\n",
    "\n",
    "# a 4-dim tensor with shape [10, 299, 299, 3]\n",
    "blank_image = tf.zeros([10, 299, 299, 3])\n",
    "print(\"tf.shape returns a Tensor:\", tf.shape(blank_image))\n",
    "print(\"while .shape returns a tuple:\", blank_image.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Tensor Shape and Broadcasting\n",
    "\n",
    "The shape of a tensor can be checked by calling the [tf.shape](https://www.tensorflow.org/api_docs/python/tf/shape) operation. (Note too that the dimension can be checked with `len(x.shape)`). Tensors can be reshaped with [tf.reshape](https://www.tensorflow.org/api_docs/python/tf/reshape). For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(tf.range(10))\n",
    "print()\n",
    "print(tf.reshape(tf.range(10), (2,5)))  # re-arrange into two rows\n",
    "print()\n",
    "print(tf.reshape(tf.range(10), (1,10))) # add a dimension"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The shape of a tensor is important to determine what operations are valid on it. TensorFlow uses the same operational semantics and broadcasting rules as numpy. Operations are generally pointwise, as illustrated by the following multiplication which calculates squares."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Shapes\", tf.range(10).shape, \"and\", tf.range(10).shape, \"gives\", (tf.range(10) * tf.range(10)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Scalars with shape `()` can always be broadcast to operate with anything"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Shapes\", tf.range(10).shape, \"and\", tf.convert_to_tensor(2).shape, \"gives\", (tf.range(10) - tf.convert_to_tensor(2)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And similarly, if one of the tensors has a 1 in a dimension and the other doesn't, broadcasting occurs in that dimension. You can assume all tensors' shape begin with an implicit 1, which allows the last example below to work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# these operation will fail if uncommented\n",
    "# tf.range(10) * tf.range(20)\n",
    "# tf.ones([2,10]) * tf.ones([3,10])\n",
    "\n",
    "# note that tf.ones([a,b]) == tf.reshape(tf.ones(a*b), [a,b])\n",
    "print(\"This example broadcasts in the first two dimensions to get shape:\", (tf.ones([3,1,10]) * tf.ones([1,3,10])).shape)\n",
    "print(\"This one works because the first has implicit shape [1,10], giving shape:\", (tf.range(10) * tf.ones([2,10], dtype=tf.int32)).shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Tensor data type\n",
    "\n",
    "Broadcasting gives a good amount of flexibility to working with shapes, but TensorFlow will never implicitly change data types, leading to hidden errors like the one below. Changing the data type is easy with `tf.cast`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This won't work, even though the shapes line up: tf.range(10) * tf.ones([2,10])\n",
    "# Why?\n",
    "print(\"First type is\", tf.range(10).dtype, \"and second is\", tf.ones([2,10]).dtype)\n",
    "print(\"This works though!\", tf.range(10) * tf.cast(tf.ones([2,10]), tf.int32))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Trainable variables\n",
    "\n",
    "In machine learning, we're interested in using models which are parameterized with trainable variables. We can create variables with `tf.Variable` and by providing the initial value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(tf.Variable(tf.random.normal([10])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use TensorFlow's autodifferentiation, which tracks operations for you and automatically backpropagates gradients when requested. This tracking happens in the context of a [`tf.GradientTape`](https://www.tensorflow.org/api_docs/python/tf/GradientTape) and gradients are calculated by a call to `tf.GradientTape.gradient`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = tf.Variable(3.0)\n",
    "with tf.GradientTape() as tape:\n",
    "    y = x ** 2 # calculate x^2\n",
    "\n",
    "# Call gradient with output value(s) and variable(s)\n",
    "grad = tape.gradient(y,x)\n",
    "\n",
    "print(grad) # we expect this to be 2*x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Gradient descent optimization\n",
    "\n",
    "Next, we're going to take a big step to using gradient descent to solve a system of linear equations, `Ax=b`, like you might see in a linear algebra class. We'll generate fixed values for A and b, and make `x` a variable we can learn. Then, we'll calculate an error function (the `difference_sq` line below), and use the gradients of the error with respect to `x` to update it to make the error smaller on the next run. We can do this for all the indices of the `x` vector simultaneously."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "learning_rate = 0.25\n",
    "num_iterations = 20\n",
    "\n",
    "# the optimizer allows us to apply gradients to update variables\n",
    "optimizer = tf.keras.optimizers.Adam(learning_rate)\n",
    "\n",
    "# Create a fixed matrix, A\n",
    "A = tf.random.normal([4,4])\n",
    "# Create x using an arbitrary initial value\n",
    "x = tf.Variable(tf.ones([4, 1]))\n",
    "# Create a fixed vector b\n",
    "b = tf.random.normal([4, 1])\n",
    "\n",
    "# Check the initial values\n",
    "print(\"A:\", A.numpy())\n",
    "print(\"b:\", b.numpy())\n",
    "\n",
    "print(\"Initial x:\", x.numpy())\n",
    "print(\"Ax:\", (A@x).numpy())\n",
    "print()\n",
    "\n",
    "# We want Ax - b = 0, so we'll try to minimize its value\n",
    "for step in range(num_iterations):\n",
    "    print(\"Iteration\", step)\n",
    "    with tf.GradientTape() as tape:\n",
    "        # Calculate A*x\n",
    "        product = tf.matmul(A, x)\n",
    "        # calculat the loss value we want to minimize\n",
    "        # what happens if we don't use the square here?\n",
    "        difference_sq = tf.math.square(product - b)\n",
    "        print(\"Squared error:\", tf.norm(tf.math.sqrt(difference_sq)).numpy())\n",
    "        # calculate the gradient\n",
    "        grad = tape.gradient(difference_sq, [x])\n",
    "        print(\"Gradients:\")\n",
    "        print(grad)\n",
    "        # update x\n",
    "        optimizer.apply_gradients(zip(grad, [x]))\n",
    "        print()\n",
    "        \n",
    "# Check the final values\n",
    "print(\"Optimized x\", x.numpy())\n",
    "print(\"Ax\", (A@x).numpy()) # Should be close to the value of b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use `.numpy()` to get the value of the tensor for a cleaner output message. If the number of iterations is large enough, we will eventually learn a vector for `x` which approximately satisfies the system of equations.\n",
    "\n",
    "### Homework\n",
    "\n",
    "Your homework is to specify an equation that you will solve with gradient descent (as above). Then, play around with the learning rate and number of update iterations to get an intuitive understanding of how they affect your solver. Write up a paragraph or two describing your equation, how learning rate and number of iterations gave a better or worse solution, and with your intuition for why. Submit this writeup in a `.pdf` with a `.py` of your code.\n",
    "\n",
    "I'm expecting this to take about an hour (or less if you're experienced). Feel free to use any code from this or previous hackathons. If you don't understand how to do any part of this or if it's taking you longer than that, please let me know in office hours or by email (both can be found on the syllabus). I'm also happy to discuss if you just want to ask more questions about anything in this notebook!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dl_notebooks",
   "language": "python",
   "name": "dl_notebooks"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
